Skip to content

fix(langgraph): surface stream-connect failures fast (configurable retry budget)#677

Merged
blove merged 2 commits into
mainfrom
fix/chat-error-handling-retry
Jun 17, 2026
Merged

fix(langgraph): surface stream-connect failures fast (configurable retry budget)#677
blove merged 2 commits into
mainfrom
fix/chat-error-handling-retry

Conversation

@blove

@blove blove commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Problem

examples/chat e2e error-handling.spec.ts has been red on main since ~#666, blocking the Canonical demo → Vercel deploy. A failed stream no longer surfaces the error alert within the test's 15s window.

Root cause (proven)

#667 bumped @langchain/langgraph-sdk 1.7.4 → 1.9.22. The new SDK wraps every connect attempt — including the initial run stream — in AsyncCaller's p-retry (default 4 retries, exponential backoff ≈ 15s) before throwing a ConnectionError. So a failed stream no longer fails fast: the error alert surfaces ~15-20s late instead of immediately.

  • Confirmed by bisecting deps locally: SDK 1.7.4 → test passes (~3s); SDK 1.9.22 → test times out at 15s. Bumping the assertion window to 30s makes it pass at 26.9s — the alert does appear, just after the retry backoff.

This is both the CI failure and a real UX regression (a dropped connection hangs ~15s with no feedback).

Fix

Production keeps the SDK's resilient default. Add a clientOptions.maxRetries knob on provideAgent / agent(), threaded → FetchStreamTransportcreateLangGraphClient → SDK callerOptions. examples/chat opts into fail-fast (0 retries) only under e2e, via a localStorage flag the spec sets with addInitScript.

  • libs/langgraph: new LangGraphClientOptions type; clientOptions on AgentOptions / AgentConfig; createLangGraphClient maps maxRetriescallerOptions.
  • examples/chat: e2eClientOptions() helper (+ unit tests); DemoShell switches to factory-form provideAgent so the flag is read at injection time (not module-load).
  • api-docs regenerated for the new public surface.

Verification

  • examples/chat e2e full suite: 41 passed (error-handling now passes in 2.9s).
  • langgraph lint + unit tests pass; examples-chat-angular unit tests pass (37, incl. new e2eClientOptions specs).

🤖 Generated with Claude Code

… retry budget (#667 regression)

@langchain/langgraph-sdk 1.9.x (bumped from 1.7.4 in #667) wraps every
connect attempt — including the initial run stream — in AsyncCaller's
p-retry (default 4 retries, exponential backoff ~15s) before throwing a
ConnectionError. A failed stream therefore no longer fails fast: the error
alert surfaces ~15-20s late instead of immediately, which broke
examples/chat e2e `error-handling` (15s assertion window) and degrades the
real UX of a dropped connection.

Production keeps the SDK's resilient default. Add a `clientOptions.maxRetries`
knob on `provideAgent` / `agent()` (threaded → FetchStreamTransport →
createLangGraphClient → SDK `callerOptions`). examples/chat reads a localStorage
flag (`THREADPLANE_E2E_MAX_RETRIES`) to opt into fail-fast (0 retries) under
test; the e2e sets it via addInitScript so the alert surfaces immediately.

- libs/langgraph: LangGraphClientOptions type + clientOptions on AgentOptions /
  AgentConfig; createLangGraphClient maps maxRetries → callerOptions.
- examples/chat: e2eClientOptions() helper (+ unit tests); DemoShell switches to
  factory-form provideAgent so the flag is read at injection time.
- api-docs regenerated for the new public surface.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
threadplane Ready Ready Preview, Comment Jun 17, 2026 3:37am

Request Review

@blove blove merged commit 3c45540 into main Jun 17, 2026
3 checks passed
blove added a commit that referenced this pull request Jun 17, 2026
…eak app build) (#679)

`tsconfig.app.json` includes `src/**/*.ts` with `types: []`, so the Angular
app build type-checks every spec. `e2e-overrides.spec.ts` (added in #677) used
ambient `describe`/`it`/`expect`/`afterEach`, which fail under the app build
(`TS2304`/`TS2593`) — the bundle never generates, the e2e server never starts,
and all 4 `examples/chat` e2e shards time out. Matches the existing specs by
importing the globals from vitest explicitly.

Regressed main at 3c45540 (#677 merge); local `nx test` masked it because only
`nx serve`/the app build compiles specs under tsconfig.app.json.

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant